2 research outputs found

    Real time stream processing for Internet of things and sensing environments

    Get PDF
    Includes bibliographical references.2015 Fall.Improvements in miniaturization and networking capabilities of sensors have contributed to the proliferation of Internet of Things (IoT) and continuous sensing environments. Data streams generated in such settings must keep pace with generation rates and be processed in real time. Challenges in accomplishing this include: high data arrival rates, buffer overflows, context-switches during processing, and object creation overheads. We propose a holistic framework that addresses the CPU, memory, network, and kernel issues involved in stream processing. Our prototype, Neptune, builds on the Granules cloud runtime and leverages its support for scheduling packets and communications based on publish/subscribe, peer to peer, and point-to-point. The framework maximizes bandwidth utilization in the presence of small messages via the use of buffering and dynamic compactions of packets based on their entropy. Our use of thread-pools and batched processing reduces context switches and improves effective CPU utilizations. The framework alleviates memory pressure that can lead to swapping, page faults, and thrashing through efficient reuse of objects. To cope with buffer overflows we rely on flow control and throttling the preceding stages of a processing pipeline. Our correctness criteria included deadlock/livelock avoidance, and ordered and exactly-once processing. Our benchmarks demonstrate the suitability of the Granules/Neptune combination and we contrast our performance with Apache Storm, the dominant stream-processing framework developed by Twitter. At a single node, we are able to achieve a processing rate of ~2 million stream packets per-second. In a distributed cluster setup, we are able to achieve a processing rate of ~100 million stream packets per-second with a near-optimal bandwidth utilization

    Near real-time processing of voluminous, high-velocity data streams for continuous sensing environments

    Get PDF
    Includes bibliographical references.2020 Summer.Recent advancements in miniaturization, falling costs, networking enhancements, and battery technologies have contributed to a proliferation of networked sensing devices. Arrays of coordinated sensing devices are deployed in continuous sensing environments (CSEs) where the phenomena of interest are monitored. Observations sensed by devices in a CSE setting are encapsulated as multidimensional data streams that must subsequently be processed. The vast number of sensing devices, the high rates at which data are generated, and the high-resolutions at which these measurements are performed contribute to the voluminous, high-velocity data streams that are now increasingly pervasive. These data streams must be processed in near real-time to power user-facing applications such as visualization dashboards and monitoring systems, as well as various stages of data ingestion pipelines such as ETL pipelines. This dissertation focuses on facilitating efficient ingestion and near real-time processing of voluminous, high-velocity data streams originating in CSEs. Challenges in ingesting and processing such streams include energy and bandwidth constraints at the data sources, data transfer and processing costs, underutilized resources, and preserving the performance of stream processing applications in the presence of variable workloads and system conditions. Toward this end, we explore design principles to build a high-performant and adaptive stream processing engine to address processing challenges that are unique to CSE data streams. Further, we demonstrate how our holistic methodology based on space-efficient representations of data streams through a controlled trade-off of accuracy, can substantially alleviate stream ingestion challenges while improving the stream processing performance. We evaluate the efficacy of our methodology using real-world streaming datasets in a large-scale setup and contrast against the state-of-the-art developments in the field
    corecore